A Pipeline Japanese Entity Linking System with Embedding Features

نویسنده

  • Shuangshuang Zhou
چکیده

Entity linking (EL) is the task of connecting mentions in texts to entities in a large-scale knowledge base such as Wikipedia. In this paper, we present a pipeline system for Japanese EL which consists of two standard components, namely candidate generation and candidate ranking. We investigate several techniques for each component, using a recently developed Japanese EL corpus. For candidate generation, we find that a concept dictionary using anchor texts of Wikipedia is more effective than methods based on surface similarity. For candidate ranking, we verify that a set of features used in English EL is effective in Japanese EL as well. In addition, by using a corpus that links Japanese mentions to Japanese Wikipedia entries, we are able to get rich context information from Japanese Wikipedia articles and benefit mention disambiguation. It was not directly possible with previous EL corpora, which associate mentions to English Wikipedia entities. We take this advantage by exploring several embedding models that encode context information of Wikipedia entities, and show that they improve candidate ranking. As a whole, our system achieves 82.27% accuracy, significantly outperforming previous work.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

سیستم شناسایی و طبقه‌بندی موجودیت‌های اسمی در متون زبان فارسی بر پایه شبکه عصبی

Named Entity Recognition (NER) is a fundamental task in natural language processing and also known as a subset of information extraction. We seek to locate and classify named entities in text into predefined categories such as the names of persons, organizations, locations, expressions of times, etc. Named Entity Recognition for English texts has been researched widely for the past years, howev...

متن کامل

CUNY-BLENDER TAC-KBP2010 Entity Linking and Slot Filling System Description

The CUNY-BLENDER team participated in the following tasks in TAC-KBP2010: Regular Entity Linking, Regular Slot Filling and Surprise Slot Filling task (per:disease slot). In the TAC-KBP program, the entity linking task is considered as independent from or a pre-processing step of the slot filling task. Previous efforts on this task mainly focus on utilizing the entity surface information and the...

متن کامل

Toward Socially-Infused Information Extraction: Embedding Authors, Mentions, and Entities

We present a novel neural network model for entity linking that exploits distributed representations of users, mentions, and entities. • Our system leverages social network structures by utilizing entity homophily to improve entity disambiguation. • Our neural network model is on par with the tree-based model (Yang and Chang 2015) with surface features, but it is much easier to add additional i...

متن کامل

A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features

Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...

متن کامل

Domino: SAIC's English Entity-Linking System

The Domino system was SAIC’s student-intern entry to the English Entity-Linking track of the 2012 TAC-KBP competition. This paper describes how Domino was developed using components from the CUNY-BLENDER system and discusses the features and rules that were added to Domino. It analyzes Domino’s performance, and suggests ways in which we plan to improve the system in the future. 1.Building the D...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016